Replication archive for: "How densely do manufacturing establishments occupy land?"
by Kristian Behrens, Florian Mayneris, and Théophile Ndjanmou Biéda
forthcoming in the Review of Economics and Statistics


1. Structure: This replication archive contains the following folders:

	- clustering: The C++ code to cluster the densest dissemination areas in the CMAs to determine city centers

	- data_processed: Folder that contains processed data files required for the regressions. This folder is empty and
	  gets populated while running the _MASTER.do file. We provide two files for convenience: database_for_desc.dta
	  and database_for_reg.dta. These are the two final files to run the regression part in the _MASTER.do file. You can
	  delete all intermediate files, but those two files listed above must remain.

	- data_raw: Folder that contains all the raw data we use. There is census data, parcel and building data, as well as QC
 	  assessment roll data (infolot) and QC firm registry data (registraire du Québec). There is a subfolder /private which is
 	  empty. It contains the proprietary data that we cannot share. To obtain access to those data, see below.

	- dofiles: Contains all the dofiles required to replicate the results in the paper and to build the main database
	  for the regressions. The first files (step1-step5) build the regression databases database_for_desc.dta
	  and database_for_reg.dta, as well as auxiliary files. These files are stored in data_processed. The remaining steps
	  build the tables and figures. There is one file per table/figure.
	  The main file is _MASTER.do, it allows to run through the whole process and to construct all results.

	- readme.txt: This file

	- results: Contains all tables (in /tables) and the figures (in /figures). These files get created when running the
	  second part of the _MASTER.do file.

	- temp: Folder to store temporary files. It's contents can be deleted at will.

2. Software: We ran the code on Mac (OS 12.6.2) and Stata 16.1 (MP4) on a MacBook Pro M1 with 16G of ram. We also ran it in a Windows machine.

3. Restricted access data: We use the Scotts' National All data for 2001-2017. These data are commercial and must be bought from Scotts. Unfortunately, Scotts does not maintain historical data, which is thus only available from us. In order for us to share the data, the researcher must show that they have purchased a full subscription with the Scotts' Directories. Upon proof of such subscription, we can share the data that is in the /data_raw/private folder of the project. These data allow the researcher to run the whole empirical analysis.

4. Setup: Please unzip the replication archive into any folder on your machine. Then, open _MASTER.do and set the global variable $workpath to point to the path to which you have unzipped the archive. Everything should then run smoothly. Only Stata is required to replicate the main analysis. Reconstruction of the raw datafiles underlying the analysis requires extensive GIS work with ArcGIS, QGIS, and/or Cartographica.